Evaluating the Impact of Information Distortion on Normalized Compression Distance

نویسندگان

  • Ana Granados
  • Manuel Cebrián
  • David Camacho
  • Francisco B. Rodríguez
چکیده

In this paper we apply different techniques of information distortion on a set of classical books written in English. We study the impact that these distortions have upon the Kolmogorov complexity and the clustering by compression technique (the latter based on Normalized Compression Distance, NCD). We show how to decrease the complexity of the considered books introducing several modifications in them. We measure how the information contained in each book is maintained using a clustering error measure. We find experimentally that the best way to keep the clustering error is by means of modifications in the most frequent words. We explain the details of these information distortions and we compare with other kinds of modifications like random word distortions and unfrequent word distortions. Finally, some phenomenological explanations from the different empirical results that have been carried

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the Impact of Information Distortion on Normalized Compression Distance-driven Text Clustering

In this paper we apply different techniques of information distortion on a set of classical books written in English. We study the impact that these distortions have upon the Kolmogorov complexity and the clustering by compression technique (the latter based on Normalized Compression Distance, NCD). We show how to decrease the complexity of the considered books introducing several modifications...

متن کامل

Perceptual Normalized Information Distance for Image Distortion Analysis Based on Kolmogorov Complexity

Image distortion analysis is a fundamental issue in many image processing problems, including compression, restoration, recognition, classification, and retrieval. In this work, we investigate the problem of image distortion measurement based on the theories of Kolmogorov complexity and normalized information distance (NID), which have rarely been studied in the context of image processing. Bas...

متن کامل

Analysis and study on text representation to improve the accuracy of the Normalized Compression Distance

The huge amount of information stored in text form makes methods that deal with texts really interesting. This thesis focuses on dealing with texts using compression distances. More specifically, the thesis takes a small step towards understanding both the nature of texts and the nature of compression distances. Broadly speaking, the way in which this is done is exploring the effects that sever...

متن کامل

The normalized compression distance and image distinguishability

We use an information-theoretic distortion measure called the Normalized Compression Distance (NCD), first proposed by M. Li et al., to determine whether two rectangular gray-scale images are visually distinguishable to a human observer. Image distinguishability is a fundamental constraint on operations carried out by all players in an image watermarking system. The NCD between two binary strin...

متن کامل

Image distortion analysis based on normalized perceptual information distance

Image distortion analysis is a fundamental issue in many image processing problems, including compression, restoration, recognition, classification, and retrieval. Traditional image distortion evaluation approaches tend to be heuristic and are often limited to specific application environment. In this work, we investigate the problem of image distortion measurement based on the theory of Kolmog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008